92 research outputs found

    SiZer for time series: A new approach to the analysis of trends

    Get PDF
    Smoothing methods and SiZer are a useful statistical tool for discovering statistically significant structure in data. Based on scale space ideas originally developed in the computer vision literature, SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical device to assess which observed features are `really there' and which are just spurious sampling artifacts. In this paper, we develop SiZer like ideas in time series analysis to address the important issue of significance of trends. This is not a straightforward extension, since one data set does not contain the information needed to distinguish `trend' from `dependence'. A new visualization is proposed, which shows the statistician the range of trade-offs that are available. Simulation and real data results illustrate the effectiveness of the method.Comment: Published at http://dx.doi.org/10.1214/07-EJS006 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analysis of dependence among size, rate and duration in internet flows

    Get PDF
    In this paper we examine rigorously the evidence for dependence among data size, transfer rate and duration in Internet flows. We emphasize two statistical approaches for studying dependence, including Pearson's correlation coefficient and the extremal dependence analysis method. We apply these methods to large data sets of packet traces from three networks. Our major results show that Pearson's correlation coefficients between size and duration are much smaller than one might expect. We also find that correlation coefficients between size and rate are generally small and can be strongly affected by applying thresholds to size or duration. Based on Transmission Control Protocol connection startup mechanisms, we argue that thresholds on size should be more useful than thresholds on duration in the analysis of correlations. Using extremal dependence analysis, we draw a similar conclusion, finding remarkable independence for extremal values of size and rate.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS268 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Support vector machines with adaptive penalty

    Get PDF
    The standard Support Vector Machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions (Bradley and Mangasarian, 1998; Zhu et al., 2003). These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L2 SVM generally works well except when there are too many noise inputs, while the L1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the Lq SVM, where the best q > 0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed Lq penalty is more robust to noise variables than the L1 and L2 penalties. An iterative algorithm is suggested to solve the Lq SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure

    Multiscale Exploratory Analysis of Regression Quantiles Using Quantile SiZer

    Get PDF
    The SiZer methodology proposed by Chaudhuri & Marron (1999) is a valuable tool for conducting exploratory data analysis. Since its inception different versions of SiZer have been proposed in the literature. Most of these SiZer variants are targeting the mean structure of the data, and are incapable of providing any information about the quantile composition of the data. To fill this need, this article proposes a quantile version of SiZer for the regression setting. By inspecting the SiZer maps produced by this new SiZer, real quantile structures hidden in a data set can be more effectively revealed, while at the same time spurious features can be filtered out. The utility of this quantile SiZer is illustrated via applications to both real data and simulated examples

    Support vector machines with adaptive Lq penalty

    Get PDF
    The standard support vector machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998

    Visualization and inference based on wavelet coefficients, SiZer and SiNos

    Get PDF
    SiZer (SIgnificant ZERo crossing of the derivatives) and SiNos (SIgnificant NOnStationarities) are scale-space based visualization tools for statistical inference. They are used to discover meaningful structure in data through exploratory analysis involving statistical smoothing techniques. Wavelet methods have been successfully used to analyze various types of time series. In this paper, we propose a new time series analysis approach, which combines the wavelet analysis with the visualization tools SiZer and SiNos. We use certain functions of wavelet coefficients at different scales as inputs, and then apply SiZer or SiNos to highlight potential non-stationarities. We show that this new methodology can reveal hidden local non-stationary behavior of time series, that are otherwise difficult to detect

    Long-range dependence in a changing Internet traffic mix

    Get PDF
    This paper provides a deep analysis of long-range dependence in a continually evolving Internet traffic mix by employing a number of recently developed statistical methods. Our study considers time-of-day, day-of-week, and cross-year variations in the traffic on an Internet link. Surprisingly large and consistent differences in the packet-count time series were observed between data from 2002 and 2003. A careful examination, based on stratifying the data according to protocol, revealed that the large difference was driven by a single UDP application that was not present in 2002. Another result was that the observed large differences between the two years showed up only in packet-count time series, and not in byte counts (while conventional wisdom suggests that these should be similar). We also found and analyzed several of the time series that exhibited more “bursty” characteristics than could be modeled as Fractional Gaussian Noise. The paper also shows how modern statistical tools can be used to study long-range dependence and non-stationarity in Internet traffic data

    Dependent SiZer: Goodness-of-Fit Tests for Time Series Models

    Get PDF
    In this paper, we extend SiZer (SIgnificant ZERo crossing of the derivatives) to dependent data for the purpose of goodness of fit tests for time series models. Dependent SiZer compares the observed data with a specific null model being tested by adjusting the statistical inference using an assumed autocovariance function. This new approach uses a SiZer type visualization to flag statistically significant differences between the data and a given null model. The power of this approach is demonstrated through some examples of time series of Internet traffic data. It is seen that such time series can have even more burstiness than is predicted by the popular, long range dependent, Fractional Gaussian Noise model

    Experimental Investigation of the Effects of Concrete Alkalinity on Tensile Properties of Preheated Structural GFRP Rebar

    Get PDF
    The combined effects of preexposure to high temperature and alkalinity on the tensile performance of structural GFRP reinforcing bars are experimentally investigated. A total of 105 GFRP bar specimens are preexposed to high temperature between 120°C and 200°C and then immersed into pH of 12.6 alkaline solution for 100, 300, and 660 days. From the test results, the elastic modulus obtained at 300 immersion days is almost the same as those of 660 immersion days. For all alkali immersion days considered in the test, the preheated specimens provide slightly lower elastic modulus than the unpreheated specimens, showing only 8% maximum difference. The tensile strength decreases for all testing cases as the increase of the alkaline immersing time, regardless of the prehearing levels. The tensile strength of the preheated specimens is about 90% of the unpreheated specimen for 300 alkali immersion days. However, after 300 alkali immersion days the tensile strengths are almost identical to each other. Such results indicate that the tensile strength and elastic modulus of the structural GFRP reinforcing bars are closely related to alkali immersion days, not much related to the preheating levels. The specimens show a typical tensile failure around the preheated location
    • …
    corecore